This is the R project of Fabian and Robin. We decided to analyse data from billionaires.
For that, we found a dataset on kaggle.com: https://www.kaggle.com/datasets/nelgiriyewithana/billionaires-statistics-dataset
This dataset contains data from the 4th of april 2023 and is used for ‘exploring the global landscape of success’.
In the following, we will create several graphs to compare the correlation between certain columns of data and wealth. As this dataset only contains billionaires, we won’t focus on the difference between their wealth. Being part of this list already means being incredibly rich. Accordingly, it is not necessary to analyze how much wealth is considered rich.
Load the data into the R session and get an initial overview Which types are included?
Describe what you need to do before you can prepare and edit the data in the next section!
# Installation of the needed libraries, if not already installed
if (!require(ggplot2)) install.packages("ggplot2")
if (!require(dplyr)) install.packages("dplyr")
if (!require(ggcorrplot)) install.packages("ggcorrplot")
if (!require(ggthemes)) install.packages("ggthemes")
if (!require(tidyverse)) install.packages("tidyverse")
if (!require(gridExtra)) install.packages("gridExtra")
if (!require(plotly)) install.packages("plotly")
# We used these libraries
library(ggplot2)
library(dplyr)
library(ggcorrplot)
library(ggthemes)
library(tidyverse)
library(gridExtra)
library(plotly)
df_billionaires <- read.csv("BillionairesStatisticsDataset.csv")
# display data
head(df_billionaires)
## rank finalWorth category personName age country city source
## 1 1 211000 Fashion & Retail Bernard Arnault & family 74 France Paris LVMH
## 2 2 180000 Automotive Elon Musk 51 United States Austin Tesla, SpaceX
## 3 3 114000 Technology Jeff Bezos 59 United States Medina Amazon
## 4 4 107000 Technology Larry Ellison 78 United States Lanai Oracle
## 5 5 106000 Finance & Investments Warren Buffett 92 United States Omaha Berkshire Hathaway
## 6 6 104000 Technology Bill Gates 67 United States Medina Microsoft
## industries countryOfCitizenship organization selfMade status gender birthDate
## 1 Fashion & Retail France LVMH Moët Hennessy Louis Vuitton FALSE U M 3/5/1949 0:00
## 2 Automotive United States Tesla TRUE D M 6/28/1971 0:00
## 3 Technology United States Amazon TRUE D M 1/12/1964 0:00
## 4 Technology United States Oracle TRUE U M 8/17/1944 0:00
## 5 Finance & Investments United States Berkshire Hathaway Inc. (Cl A) TRUE D M 8/30/1930 0:00
## 6 Technology United States Bill & Melinda Gates Foundation TRUE D M 10/28/1955 0:00
## lastName firstName title date state residenceStateRegion birthYear birthMonth birthDay
## 1 Arnault Bernard Chairman and CEO 4/4/2023 5:01 1949 3 5
## 2 Musk Elon CEO 4/4/2023 5:01 Texas South 1971 6 28
## 3 Bezos Jeff Chairman and Founder 4/4/2023 5:01 Washington West 1964 1 12
## 4 Ellison Larry CTO and Founder 4/4/2023 5:01 Hawaii West 1944 8 17
## 5 Buffett Warren CEO 4/4/2023 5:01 Nebraska Midwest 1930 8 30
## 6 Gates Bill Cochair 4/4/2023 5:01 Washington West 1955 10 28
## cpi_country cpi_change_country gdp_country gross_tertiary_education_enrollment
## 1 110.05 1.1 $2,715,518,274,227 65.6
## 2 117.24 7.5 $21,427,700,000,000 88.2
## 3 117.24 7.5 $21,427,700,000,000 88.2
## 4 117.24 7.5 $21,427,700,000,000 88.2
## 5 117.24 7.5 $21,427,700,000,000 88.2
## 6 117.24 7.5 $21,427,700,000,000 88.2
## gross_primary_education_enrollment_country life_expectancy_country tax_revenue_country_country total_tax_rate_country
## 1 102.5 82.5 24.2 60.7
## 2 101.8 78.5 9.6 36.6
## 3 101.8 78.5 9.6 36.6
## 4 101.8 78.5 9.6 36.6
## 5 101.8 78.5 9.6 36.6
## 6 101.8 78.5 9.6 36.6
## population_country latitude_country longitude_country
## 1 67059887 46.22764 2.213749
## 2 328239523 37.09024 -95.712891
## 3 328239523 37.09024 -95.712891
## 4 328239523 37.09024 -95.712891
## 5 328239523 37.09024 -95.712891
## 6 328239523 37.09024 -95.712891
Each row displays one person that is a billionaire. The columns contain the following
details about each person:
The plot displays ten bars in a barchart. Each bar represents the wealth of one billionaire. The plot only displays the top 10 richest billionaires. The height of the bar represents the wealth of the billionaire in US-dollars.
The code calculates and visualizes the correlation matrix of numeric variables in the dataset, indicating the strength and direction of linear relationships between measures such as wealth, age, Consumer Price Index, Gross Domestic Product, life expectancy, and other relevant numeric columns.
In this section, you should perform all the necessary transformations/cleansing/… etc. of the data (Data Muning, Data Cleansing), e.g:
Get an overview of the transformed data. You can use tools such as glimpse(), skim() and head() to illustrate your explanations.
Is the resulting data what you expected? Why or why not?
# Calculate the total wealth
total_wealth <- sum(df_billionaires$finalWorth)
# Calculate percentage of billionaires wealth compared to global wealth:
result <- (total_wealth * 1e6) / (454.4 * 1e12)
# Sort billionaires by wealth in descending order
sorted_billionaires <- df_billionaires[order(df_billionaires$finalWorth, decreasing = TRUE), ]
# Calculate the cumulative sum of wealth
sorted_billionaires$cumulative_wealth <- cumsum(sorted_billionaires$finalWorth)
# Find the number of billionaires required to own more than 80% of total wealth
num_billionaires_80_percent <- sum(sorted_billionaires$cumulative_wealth <= 0.8 * total_wealth)
# Find the number of billionaires required to own more than 80% of total wealth
num_billionaires_50_percent <- sum(sorted_billionaires$cumulative_wealth <= 0.5 * total_wealth)
# Clean 'finalWorth' column and convert to numeric
df_billionaires$finalWorth <- as.numeric(gsub("[^0-9.]", "", df_billionaires$finalWorth))
# Filter the dataset for billionaires with wealth below $25,000
wealth_filtered <- subset(df_billionaires, finalWorth < 25000)
# Calculate average wealth by industry
average_wealth_by_industry <- tapply(df_billionaires$finalWorth, df_billionaires$industries, mean, na.rm = TRUE)
# Identify the industry with the highest average wealth
most_profitable_industry <- names(average_wealth_by_industry[which.max(average_wealth_by_industry)])
# Create a data frame for plotting
industry_plot_data <- data.frame(
industry = names(average_wealth_by_industry),
average_wealth = average_wealth_by_industry
)
# Sort the data frame by average wealth
industry_plot_data <- industry_plot_data[order(industry_plot_data$average_wealth, decreasing = TRUE), ]
# Convert columns with monetary values to numeric
gdp_wealth_data <- df_billionaires
gdp_wealth_data$finalWorth <- as.numeric(gsub("[^0-9.]", "", gdp_wealth_data$finalWorth))
gdp_wealth_data$gdp_country <- as.numeric(gsub("[^0-9.]", "", gdp_wealth_data$gdp_country))
gdp_wealth_data$tax_revenue_country_country <- as.numeric(gsub("[^0-9.]", "", gdp_wealth_data$tax_revenue_country_country))
gdp_wealth_data$population_country <- as.numeric(gsub("[^0-9.]", "", gdp_wealth_data$population_country))
# Filter out rows with missing values in relevant columns
gdp_wealth_data <- gdp_wealth_data %>% filter(!is.na(finalWorth) & !is.na(gdp_country))
# Filter for billionaires with country
df_geographical <- na.omit(df_billionaires["country"]) %>%
# Group by country
group_by(country) %>%
# Count billionaires per country
summarise(count = n()) %>%
# Sort by count
arrange(desc(count))
# Get finalworth per country
df_worth_country <- na.omit(df_billionaires[c("country", "finalWorth")]) %>%
# Group by country
group_by(country) %>%
# Count billionaires per country
summarise(
count = n(),
total_finalWorth = (sum(finalWorth) / 1000)
# Hier können Sie weitere Aggregationsfunktionen hinzufügen, falls benötigt
) %>%
# Keep only countries with more than 4 billionaires
filter(count >= 5)
df_us <- na.omit(df_billionaires[c("country", "industries")]) %>%
filter(country == "United States") %>%
group_by(industries) %>%
summarise(
count = n()
) %>%
arrange(desc(count))
df_china <- na.omit(df_billionaires[c("country", "industries")]) %>%
filter(country == "China") %>%
group_by(industries) %>%
summarise(
count = n()
) %>%
arrange(desc(count))
df_india <- na.omit(df_billionaires[c("country", "industries")]) %>%
filter(country == "India") %>%
group_by(industries) %>%
summarise(
count = n()
) %>%
arrange(desc(count))
# Dataframe for selfMade-country plotting
df_inherit_country <- filter(df_billionaires[c("country", "selfMade")], country != "") %>%
na.omit() %>%
group_by(country, selfMade) %>%
summarise(count = n()) %>%
arrange(selfMade, desc(count))
# delete all entries which are not in the list of top 10 inherited countries
df_inherit_country <- df_inherit_country[df_inherit_country$country %in% df_inherit_country[1:10, ]$country, ] %>%
arrange(country, desc(count))
# use df_billionaires and corellate selfMade with some columns
columns <- c("gender", "category", "birthYear", "birthMonth", "age", "finalWorth", "country", "city", "state", "selfMade")
df_temp <- na.omit(df_billionaires[columns])
# convert columns to factor
df_temp <- as.data.frame(lapply(df_temp, as.factor))
# conver all to numeric
df_temp <- as.data.frame(lapply(df_temp, as.numeric))
# corellate selfMade attribute with everything
self_made_corr <- cor(df_temp)
# Print out the correlations of selfMade
self_made_corr <- self_made_corr["selfMade", ]
# Drop selfMade
self_made_corr <- self_made_corr[-which(names(self_made_corr) == "selfMade")]
# Remove NA values
df_temp <- na.omit(df_billionaires)
# Filter for self-made billionaires and find the minimum age
youngest_age <- min(df_temp[df_temp$selfMade == TRUE, ]$age)
# Filter out rows with missing values in finalWorth or age
df_final_worth_below_25b <- subset(df_billionaires, !is.na(finalWorth) & !is.na(age) & finalWorth < 25000)
# Count the number of self-made billionaires
self_made_count <- sum(df_billionaires$selfMade == TRUE)
# Create age groups (you can adjust the breaks and labels accordingly)
age_breaks <- c(0, 20, 30, 40, 50, 60, 70, 80, 90, Inf)
age_labels <- c("<20", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")
df_age_groups <- df_billionaires
df_age_groups$age_group <- cut(df_age_groups$age, breaks = age_breaks, labels = age_labels, right = FALSE)
# Create a summary table of the proportion of self-made billionaires in each age group
summary_table <- table(df_age_groups$age_group, df_age_groups$selfMade)
prop_table <- prop.table(summary_table, margin = 1)
# Get the countries with the lowest tax rates
lowest_tax <- df_billionaires[c("total_tax_rate_country", "country")] %>%
# Remove NA values
na.omit() %>%
# Group by country
group_by(country) %>%
# Count billionaires per country
summarise(
total_tax_rate_country = first(total_tax_rate_country),
count = n()
) %>%
# Filter Argentina, because tax rate of 106,3% is not realistic.
filter(country != "Argentina") %>%
# Sort by total_tax_rate_country
arrange(total_tax_rate_country)
highest_tax <- arrange(lowest_tax, desc(total_tax_rate_country))
# Get the countries with the best educational attainment
df_education <- df_billionaires[c("gross_primary_education_enrollment_country", "gross_tertiary_education_enrollment", "country")] %>%
# Remove NA values
na.omit() %>%
# Group by country
group_by(country) %>%
# Count billionaires per country
summarise(
primary_education = first(gross_primary_education_enrollment_country),
tertiary_education = first(gross_tertiary_education_enrollment),
count = n()
# Hier können Sie weitere Aggregationsfunktionen hinzufügen, falls benötigt
) %>%
arrange(desc(primary_education))
df_temp_primary <- df_education
df_temp_tertiary <- arrange(df_education, desc(tertiary_education))
lowest_primary <- arrange(df_temp_primary, primary_education)
lowest_tertiary <- arrange(df_temp_tertiary, tertiary_education)
education_corr_df <- df_education
# convert columns to factor
education_corr_df <- as.data.frame(lapply(education_corr_df, as.factor))
# conver all to numeric
education_corr_df <- as.data.frame(lapply(education_corr_df, as.numeric))
# calculate correlation between count of billionaires and primary/tertiary education
education_corr <- cor(education_corr_df)
Summarize the data in a suitable form to answer your formulated question. summarized. You should also use suitable visualizations of the transformed and/or aggregated data to support or illustrate your statements accordingly.
You can also use suitable statistical methods or modeling here if they help you with your research question.
We think that only a small amount of individuals own a huge part of the entire wealth of all billionaires. The same is already proven for the entire world population: “half of the world’s net wealth belongs to the top 1%, top 10% of adults hold 85%, while the bottom 90% hold the remaining 15% of the world’s total wealth, top 30% of adults hold 97% of the total wealth”.
For that, we’ll look at the wealth distribution between the billionaires in this dataset:
The plot shows, that the lower the finalWorth is, the higher the frequency gets. This means that most billionaires are barely above the threshold of 1 billion usd. Only a few billionaires have a higher finalWorth than 25000 million usd.
Due to the global wealth being around $454.4 trillion in 2022. We cannot really calculate the amount of people that own 80% of the global wealth. But we can calculate the amount of billionaires that own more than 80% of the totalWealth of all billionaires:
## All billionaires wealth combined sums up to 12206800 million usd.
## The entire wealth of all billionaires is just 2.686356 % of the global wealth.
## Number of billionaires owning more than 80% of total wealth of billionaires: 1157
The richest 1157 billionaires own 80% of the combined wealth of all billionaires. Assuming this dataset is complete, there are 2640 billionaires.
Let’s look at how many billionaires own half the entire wealth of all billionaires:
## Number of billionaires owning more than 50% of total wealth of billionaires: 302
Only 302 of all 2640 billionaires own more than 50% of the total wealth. This proves that the tendency stays the same for the richest interval of wealth (billionaires) compared to the entire scope.
To check if there is a correlation, lets compare the industries with the gdp:
On this plot you cannot really see a pattern. Lets zoom in on billionaires below $25 billion:
There is no obvious pattern or correlation.
To calculate the most profitable industry, we can calculate the average wealth for each industry and compare them:
## The most profitable industry is: Automotive
This bar chart shows that the industries don’t really differ in being the most profitable. Also the results are inaccurate, due to vague classifications of their industries.
The following bar chart shows the wealth distribution over all industries given in the data set. It also differentiates between self-made wealth and inherited wealth. I also removed the top billionaires above $25 billion.
Billionaires working in Energy, Telecom, Gambling & Casinos, Sports, Service and Construction & Engineering are mostly self-made. Industries like Finance & Investment and Media & Entertainment are dominated by billionaires that inherited their wealth. Overall the Technology and Manufacturing industry has a lot of billionaires below $25 billion. We should also mention that this data is probably not 100% accurate due to the rough classification of the industry of each billionaire.
First, let’s look at the relationship between the GDP and the billionaires wealth:
There is no obvious correlation or trend. Now let’s look at the relationship between tax revenue and billionaires wealth:
This plot shows that there is no obvious correlation or trend between a countries tax revenue and the billionaires wealth. Last but not least, let’s look at the relationship between a countires population and the billionaires wealth:
Again, the plot shows no obvious correlation or trend. All these plots are biased by the amount of billionaires of each country. You can always identify the usa, france and china. The differences between those countries amost always are responsible for the fact that you cannot identify a correlation or trend.
Let’s first examine the number of billionaires per country.
The five countries with the most billionaires are:
## United States : 754
## China : 523
## India : 157
## Germany : 102
## United Kingdom : 82
What we can observe is that the highest number of billionaires resides in the USA, China, and India.
This is unsurprising, given that these countries possess the highest economic power.
Now, let’s turn our attention to the total wealth of billionaires per country.
This graph appears almost identical to the previous one. The most noteworthy observation is that the greatest wealth is concentrated in the USA and China. This might be attributed to the fact that the USA boasts the highest number of billionaires at 754, and China’s economy has recently transformed from a predominantly agrarian and impoverished nation to the world’s second-largest economy, fueled by market reforms and globalization.
Next, we can gain an overview of the industries in which billionaires are involved per country. Perhaps we can identify a pattern where the industry is closely tied to the country.
Let’s begin by examining the industries in the USA. The majority of billionaires in the US are in the finance and investment industry, followed by the technology and food & beverage industry.
In China, the most billionaires are in the manufacturing industry, followed by the technology and healthcare industry.
Like in China, the majority of billionaires in India are in the manufacturing industry. The second-largest group of billionaires in India is in the healthcare sector.
If we delve into the manufacturing industry in both China and India, we discover that China holds the title of the world’s largest manufacturing economy and exporter of goods, as mentioned earlier. India’s manufacturing industry has been experiencing rapid growth in recent years. This growth can be attributed to the comparatively low labor costs in both India and China.
However, previous analyses of GDP and industrial sectors have shown that there is no correlation between the sector/GDP and being a billionaire.
The only potentially existing geographical connection is that approximately 18% of the world’s population resides in both China and India. This is the only pattern that could explain the distribution of billionaires across these countries, suggesting that where more people live, there is also a greater potential for individuals to become billionaires.
The country with the most billionaires also tops the list for inherited billionaires, reflecting a correlation between a higher billionaire count and inheritance opportunities. China, ranking second in billionaires globally, exhibits the lowest number of inherited billionaires on the graph, indicating a distinctive wealth distribution pattern. This could be due to fast economic changes and the socialist influences discouraging wealth inheritance.
To check further why billionaires inherit their wealth, we examine if there is a mathematical correlation between the ‘self-made’ attribute and other attributes.
## gender category birthYear birthMonth age finalWorth country city state
## 0.324332535 0.194644898 0.051732894 0.001023638 -0.050433736 -0.045253107 -0.042652994 0.039837195 -0.012873595
The highest existing correlation is between gender and the self-made status, with a value of 0.32. However, this is likely because it was historically more challenging for women to establish businesses and accumulate wealth.
The second-highest value is the linear correlation between income category and the self-made status, with a value of 0.19. Both of these mentioned values, however, are too low to truly confirm a correlation between the attributes. This holds true for the other attributes in the graph as well.
This plot gives on overview on the generated or inherited wealth by each billionaire indexed by their age. The x-axis demonstrates the age from 1 to 100+ and the y-axis demonstrates the wealth. Blue dots indicate a self-made wealth, while red dots indicate inherited wealth.
Let’s also look at the youngest self-made billionaire:
## The age of the youngest self-made billionaire is 28 years.
This indicates that young billionaires below the age of 28 most definitely inherited their wealth. The youngest self-made billionaire is 28 years old. As the age increases, there is no obvious pattern. The only thing worth mentioning is that the richest billionaires are mostly self-made. This probably results out of the fact of inflation and the growing gap between rich and poor. Let’s zoom in on billionaires with a wealth below $25 billion:
This also shows no obvious pattern.
Let’s also look at a bar chart showing the amount of billionaires for each age:
Here you can see that self-made billionaires are generally younger than billionaires who inherited their wealth. You can also see a lot of self-made billionaires who are about 58 years old. This may indicate that being born about 60 years ago is a good time to get rich.
First let’s look at how many billionaires are self-made:
## Number of self-made billionaires: 1812 /2640
Out of the 2,640 billionaires worldwide, 1,812 are considered self-made, meaning they accumulated their wealth through entrepreneurship and business endeavors rather than inheriting it. This trend can be attributed to the increasing opportunities for innovation and entrepreneurship in the global economy, fostering a conducive environment for individuals to create and grow their own businesses. Factors such as technological advancements, globalization, and access to capital have empowered individuals to pursue entrepreneurial ventures, leading to a significant number of self-made billionaires. Additionally, the rise of industries like technology has provided platforms for innovative minds to disrupt traditional business models and amass substantial fortunes. Are younger billionaires more likely to be self-made?
No, young billionaires are not more likely to be self-made. Between the age of 30 and 90, most billionaires are self-made.
First of all we wanted to find out which countries have the lowest tax rates.
We also take a look at the countries with the highest tax rates to compare these with each other. Georgia has the lowest tax rate at 9.9%. There is only one billionaire in Georgia. The first graph depicts a total of 155 billionaires. The country with the highest tax rate is Colombia, with a rate of 71.2%. Colombia also has only one billionaire. The second graph shows a total of 1019 billionaires.
When comparing these plots, we can observe that more billionaires live in countries with higher tax rates. In this plot, it is notably influenced by China; however, even when excluding China from consideration, the graph with higher tax rates still accumulates more billionaires than the one with lower tax rates.
This observation, though, is not representative enough to make a statement about whether countries with higher tax rates have more billionaires. Further studies would be needed to draw a conclusive conclusion.
Let us first get an overview of the amount of billionaires .
## Amount of billionaires in the first five countries with the best primary education:
## Nepal : 1
## Sweden : 26
## Brazil : 44
## Colombia : 1
## Morocco : 2
## Amount of billionaires in the first five countries with the best tertiary education:
## Greece : 3
## Australia : 43
## South Korea : 29
## Argentina : 4
## Spain : 25
Lets do the same for the countries with the lowest educational attainment:
## Amount of billionaires in the first five countries with the lowest primary educational enrollment:
## Nigeria : 3
## Romania : 3
## Armenia : 1
## Turkey : 25
## Tanzania : 1
## Amount of billionaires in the first five countries with the lowest tertiary educational enrollment:
## Tanzania : 1
## Uzbekistan : 1
## Nigeria : 3
## Nepal : 1
## Cambodia : 1
If we compare those we can clearly see, that there are more billionaires in countries with better educational attainment.
This could be an indicator that the educational attainment is a factor for becoming a billionaire.
Now lets calculate if there is a correlation between the amount of billionaires and the primary/tertiary education.
## Correlation: billionaires - primary education: 0.03536343
## Correlation: billionaires - tertiary education: 0.2497601
The correlation between the number of billionaires and primary education is 0.035 (3.5%). The correlation for tertiary education is 0.249 (24.9%).
From this, we can deduce that there is no connection between primary education and the number of billionaires.
However, tertiary education shows a slight linear relationship with the number of billionaires.
This could make sense since tertiary education represents the highest level of education.
Summarize your research question and your findings from your analysis here. Are your findings what you expected? Why or why not?
This analysis aimed to examine patterns and correlations related to billionaire wealth, demographics, geography, industries, and national economic factors.
The key questions explored were:
The main findings were:
Overall, the findings show some interesting patterns but do not strongly validate many of the initial hypotheses. The limitations of the data likely prevented finding stronger correlations between industries, GDP, geography and other factors. More extensive data would be needed to draw firmer conclusions. The analysis provides a useful starting point for understanding the demographics and distribution of billionaire wealth.